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ABSTRACT 

Two analyses of variance (ANOVA) models for item 
scores are compared. The first is an items by subject random effect 
ANOVA. The second is a mixed effects ANOVA with items fixed and 
subjects random. Comparisons regarding reliability, Cronbach's alpha 
coefficient, psychometric inference, and inter-item covariance 
structure are made between the models. When considering the 
inter-item covariance structures for the two ANOVA models, brief 
comparisons with factor analysis models are also made. It is 
concluded that inference from a sample of items to a population of 
items requires homogenous inter-item covariances, that reliability 
nas different meanings under the two models, and that while 
coefficient alpha is a lower bound for reliability under the second 
model, it is not under the first. (Contains 51 references and two 
tabl es) . f Author/SLD) 
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Abstract 

Two ANOVA models for item scores are compared. The first is an items by 
subject random effects ANOVA. The second is a mixed effects ANOVA with items 
fixed and subjects random. Comparisons regarding reliability, 
Cronbach's a coefficient, psychometric inference, an 'J inter- item covariance 
structure are made between the models. When considering the inter- item 
covariance structures for the two ANOVA models, brief comparisons with factor 
analysis models are also made. It is concluded that inference from a sample 
of items to a population of items requires homogeneous inter- item covariances, 
that reliability has different meanings under the two models, and that while 
coefficient a is a lower bound for reliability under the second model, it is 
not under the first. 



Key Words: Coefficient Alpha, Covariance Structure, General izabi 1 ity , 
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Introduction 

This paper compares two different ANOVA models for items. The first 
model is the two-way items by examinees random effects (Model II) ANOVA. The 
second model is the two-way items by examinees mixed effects (Model III) 
ANOVA. Very careful and complete statistical derivations of these models are 
given by Scheffe' (1956a, 1956b, and 1959). This paper draws heavily 
from Scheffe' T 3 work. The two ANOVA models are compared to each other in 
detail and briefly to factor analysis models. Factor analysis models are 
extensively discussed by Harmon (1976) and Mulaik (1972). As considered here, 
the factor analysis model is statistically more similar to the mixed ANOVA 
model than to the random ANOVA model. Under the factor analysis model, items 
are considered fixed and non-random, while subjects are randomly sampled from 
a population of subjects. See Mulaik and McDonald (1978), Williams (1978), 
and McDonald and Mulaik (1979) for an alternative formulation of the factor 
analysis model. 

All of the models under consideration are linear models. A model is 
defined as linear if an examinee's expected score on an item is a linear 
function of item characteristics. Item characteristics may be fixed 
parameters as in the mixed ANOVA model or random variables as in the random 
ANOVA model. The factor analysis model is here considered to be linear in its 
item parameters which are usually called factor loadings even though these 
linear coefficients are applied to factor scores, which are unobserved random 
variables associated with examinees. An example of a nonlinear model is the 
logistic ogive item characteristic curve model (Lord and Novick, 1968). From 
a theoretical viewpoint, linear models usually do not accurately describe 
di chotomously scored items, and most items are so scored. However, for 
carefully constructed tents, linear models for item scopes are often 
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sufficiently accurate to provide usefui approximations. [See Feldt (1965), 

Hsu and Feldt (1969), Hakstian and Whalen (1976), Seeger and Gabrielsson 

(1968), Gabrielsson and Seeger (1976), McDonald and Ahlawat (197*0, McDonald 

(1981, 1935), and Collins, Cliff, McCormick, and Zatkin (1986).] 

The discussion of the models presented here will focus on three 

characteristics useful in psychometrics. The first is reliability. Under the 

three models reliability is defined as the squared correlation between an 

observed and a true score. A few relevant references regarding reliability 

are Gutman (19^5), Novick and Lewis (1957) Bentler (1972), Jackson and 

Agunwamba (1977), and Bentler and Woodward (1980, 1983). Parametric 

expressions for reliability and Cronbach's (1951) coefficient alpha are given, 

and the sampling distribution for the sample alpha coefficient is discussed. 

The second characteristic is the inter-item covariance matrix. For each 

model, the assumed or resulting covariance structure is discussed and compared 

with factor analysis models. Finally, psychometric inference is discussed. 

Psychometric inference is considered as statistical inference to a population 

of items from a sample of items randomly drawn from the population. The more 

general term generalizability is not used since it connotes statistical 

inference for a wide array of facets, not just items. There is a large body 

of literature on psychometric inference. A few references are Hotelling 

(1933), Tryon (1957), Lord and Novick (1968), Cronbach, Gleser, Nanda, and 

Rajaratnam (1972), Mulaik (1972), Kaiser and Michael (1975), Rozeboom (1^78), 

McDonald (1978), and Brennan (1983). Both the approach ana results presented 

here, while most similar to, differ in part from those developed by Lord and 

Novick (1 968) and Cronbach et al . ( 1972). 

Brief descriptions of seven conclusions original to this paper are: 

1. Conditional variances for interaction effects may be heterogeneous in 
the random ANOVA model. 
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2. The random ANOVA model requires the inter- item covariance matrix to 
have homogeneous off-diagonal elements, while the mixed ANOVA model 
places no restrictions on the inter-item covariance matrix except 
positive semi-def i ni teness. Hence, any factor analysis model may be 
subsumed under the mixed ANOVA model but not the random ANOVA model, 

3. Interaction effects in the random ANOVA model are analogous to 
specific factors in a certain single common factor factor analysis 
model, while the examinee main effect is analogous to the single 
common factor. 

J4. The squared correlation between observed scores and true scores i3 a 
useful definition of reliability under the random ANOVA model as well 
as under the mixed ANOVA model, but the definition of true score 
differs under the two models. 

5. Reliability as defined in 4. has different meanings under the two 

models. In the mixed ANOVA model, interaction (specific) variance is 
included in true score variance, while in the random ANOVA model it 
is not. 

<5. The parametric value of Cronbach's alpha coefficient is a lower bound 
to the parametric value of reliability (as defined in under the 
mixed ANOVA model but not under the random ANOVA model. 

7. Given certain normality assumptions, a transformation of the sample 
alpha coefficient has an F distribution under the random ANOVA model. 
For the mixed ANOVA model, the F distribution only holds if in 
addition to certain normality assumptions there are either no 
interactions or the inter- item covariance matrix has special 
restricted forms. 

The practical implications of these conclusions for the analysis of test data 
will be dieoussed in the last section of this paper. 

The Items by Examinees Random ANOVA Model 
The model presented here is essentially the same model developed by 
Scheffe' (1959, chap. 7), It assumes that a random sample of n items chosen 
from a countably infinite population of items is administered to a random 
sample of N examinees chosen from a countably infinite population of 
examinees. The sampling of items and examinees is assumed to be completely 
independent. Let x^ represent subject j f s observed score on item i. A 
preliminary form of the model is 
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x ij ' + e lj 1 " 1 n j = 1 N • 0) 
The quantities tjj and ej ^ are, respectively, the true a core and the error 

score of examinee j on Item i. Different definitions for true and error 

scores under the random ANOVA model will be admitted later. Within the 

present context, true and error scores are not absolutes; their definitions 

may vary depending on the inferences being made. The various true and error 

scores considered in tois ^aper are not necessarily an exhaustive set of 

possible true and error scores under the models presented. 

If examinee j responds independently and repeatedly to item i, these 

replications are indexed by the subscript k. For cognitive tests such random 

replications are rarely available, though they occasionally may be obtained 

for affective scales. The present development assumes that such replications 

are not available from the data. In the theoretical development of the model, 

these replications are allowed to be present. In particular, the model 

assumes that for the sequences of independent random variables 

e ijT e ij2' ' e ijk* '" : E(e ijk ) = 0 for a11 l » J ' and k ' and that 

Var(e ) = E(e* ) = o 2 (e..) , i.e., that the error variances are 

heterogeneous over the domains of i and j. For notational simplicity, the 

subscript k will usually be suppressed, since for the remainder of the paper 

it will usually take the value of one. 

The above imply that E|(e^j) = 0 and that EjCe^) = 0, where notation 

such as E^ and Var^ means that the expectation and variance are taken over the 

population whose members are indexed by the subscript i. When no subscript is 

present the expectation is over random replications. The above also imply 

that the true and error scores are uncorrelated, i.e., Gov. (t . . ,e^ j , ) 

= CoVj (t i j , j) - 0 for all and , respectively. It is further 

assumed that all errors are independent within and across all populations. 
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Scheffe' (1959, chap, 10) shows that the expressions for expected mean 
squares, to be presented later, are valid under the heterogeneity of error 
variances indicated above. He also shows that the F distribution theory 
invoked later is exactly valid only when the error variances are homogeneous, 
but hold3 approximately when the error variances are mildly heterogeneous if 
the design is balanced. This paper assumes that the error variances are only 
mildly heterogeneous and that each examinee responds to each item once and 
only once. Hence, the design is balanced and the F distribution theory will 
be assumed to hold when the appropriate normality assumptions, discussed 
later, are invoked. 

The following quantities will be used in later developments: 



F : (ej J * E E(e? .) - E (o 2 (e )) = o 2 (e.) , 

\J >J w w «J w 

E.(e 2 .) - E . E ( e 2 . ) = E.(o 2 (e..)) = o 2 (e.) , and 
i ij' i ij l ij J 

Bj^EjECeJj) = E i (o 2 (e j )) = E j (o 2 (e.)) « o 2 (e) . 
The model is further specified by writing 



t. . jj + a. + b . + c. . (2) 
U i J iJ 



where y = E . E . ( t , . ) , a. =-- E ,( t. . ) - y , b. = E . ( t . , ) - \x , and 
i J ij i J ij J l iJ 

c. . * t.. - E.(t..) - E . (t, .) + p . The overall mean is denoted by y , while 
ij ij 1 ij J ij 

a^ and bj denote the main effects due to item i and examinee j, 
respectively. The interaction effect due to item i and examinee j is denoted 
by Cjj. These definitions implicitly assume that all items are similarly 
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scored and hence on the same scale. Scheffe' (1959) shows that the above 
definitions imply that the model components: a i , bj , and have 
unconditional and for the o M also conditional expectations of zero. 

For what follows, it is important to note that the subscripts i and j do 
double duty; they are both subscript indicos and random variables. 
Furthermore, the a { , b j , and c^ are functions of the random variables i and 
j. Scheffe' introduces additional notation to avoid these double meanings 
for the subscripts, but the present paper sacrifices Scheffe'' 3 conceptual 
clarity for notational economy. 

Scheffe' (1959, pp 240-2.41) shows that certain marginal oovariances among 
the model components are zero. His derivations are presented here in detail 
because of their importance. Scheffe' shows that 
o(a.,c. .) - B^yojj) 

- E i ra i *E.(c 1 .)|i] 

- E (a *c, ) = 0 because c. =0 for all i, 

i i I * i • 

o( b ., Ci .) - E.[b / E i (o ij )|j] 

= E.(b.*c .) - 0 because c , = 0 for all j, 
J J # J *J 

o(c..,c r .) - EiEr E.(c..*c r .) i * i' 

- E.[E.E r (c..*c r .)|jJ 

= E J [E 1 (o 1J |j)»E r (c rj |j)J 
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= E.(c .*c .) = 0 because c . = 0 for all j, 
J • J • J • J 

and 

0(0...,...) - E.[E J (o..|i)^ r (c ir |i)] j,J' 

= E.(c. *c. ) =0 because c. =0 for all i. 
i i • l • i • 

In the above, the notation E.E., refers to the expectation over the bivariate 
' 11 K 

distribution obtained from sampling pairs of items from the population of 
items where the members of each pair are distinct. 

Scheffe' (1959) does not discuss the folLowing model component 
condi tional co variances: 



o(a.,c. j |j) = E.(a.*c. 

a(b.,c..|i) = VVc..|i), 

o(c . . ,c. , . |i , i') = E . (c. .*c . , . I i , i ' ) , and 
ij l j 1 j ij i j 1 

o(c ij fC iJ'l J,J ' ) = E i (c ij* c ij'l JfJ ' ) ' 

/ 

These conditional covariances are of considerable concern because as will be 
seen later their values determine the inter-item covariance matrix. 

Though a formal proof will not be given, it is asserted here that the 
above conditional covariances are also zero under Scheffe'' s (1959) model. 
Four considerations lead to this conclusion. First it does not appear 
possible to generate model component data such that Scheffe' f s marginal 
covariances are zero but the above conditional covariances are not. 
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Second, Scheffe' ! s proof that the above marginal covariances are zero depends 

on the order in which the conditional covariances are taken. If the order is 

switched the same result must be found. This implies that the above 

conditional covariances must have expected values of zero, and this can occur 

only if all are zero or some are positive and some negative such that their 

average is zero. Because, as will be shown, these conditional covariances 

determine the inter-item covariances, and tests are usually constructed of 

items that all intercorrelate positively, it appears more reasonable in a 

testing context to assume that the conditional covariances are zero rather 

than some positive and some negative. Third, Scheffe' (1959, pp 2^2-2^3) 

considers the two-way random model interaction components as analogous to the 

error terms in a two-way fixed effects model and these later have all 

conditional covariances as zero. Fourth, Cornfield and Tukey (1956) consider 

several covariances in the derivation of expected mean squares for factorial 

designs, but in the two-way random model these covariances are all zero. 

Scheffe' (1959) defines the variance components of the mcdel as: 

a 2 (a) - E.(a 2 ), o 2 (b) = E.(b 2 ), and o 2 (c) = E E (c 2 ). In defining a 2 (c), 
ii J J i J i J 

Scheffe' does not consider the interaction conditional variances 

o 2 (c.) = E.(c 2 .) and o 2 (c.) - E.(c 2 .). Though i and j are assumed to be 
i jij J iiJ 

statistically independent variables, c^ is a function of both these variables 
and for this reason the conditional interaction variances need not be 
homogeneous. If it is assumed that the model components have a multivariate 
normal distribution as Scheffe' sometimes does, then the model components are 
mutually statistically independent and this forces the interaction conditional 
variances to be homogeneous. Here they will be considered heterogenous unless 
otherwise specified. Scheffe' f s (1959, chap. 10) demonstration that his 
formulas for expected mean squares are valid under heterogeneity of error 
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variances implies the same under heterogeneity of interaction conditional 
variances. 

Of particular interest in the random model ANOVA are the mean squares for 

examinees and the mean squares for items by examinees which are denoted MS b 

and MS C , respectively. Scheffe' (1959) derives the following expressions for 

the expected value of these mean squares: E^ (MS^ ) = no 2 (b) + o 2 (c) + o 2 (e) 

and E tf (MS ) = o 2 (c) + o 2 (e), where E M denotes that these expectations are 
nN c nN 

the means of an infinite number of bivariate random samples consisting of n 

items and N subjects. 

These mean squares are of interest because Hoy** (19^1) has shown that the 

sample value of Cronbach's (1951) coefficient a, denoted a herein, is given by 

a - [ (MS,_ - MS VMS,] = 1 - (MS /MS W ) . The parametric counterpart of 
b c b c b 

a depends upon the statistical model used to describe the data. For the 
random ANOVA model this parameter is denoted a Drt , the subscript RA denoting 

nA 

that this definition is specific to the random model ANOVA. The 
parameter ot DA is defined by 

KA 

E M ( MS. ) - E H (MS ) 2/ . , 
^ _ _nN b_ nN c_ 5.112.1 (3) 

RA E nN (MS b ) a 2 (b) + a 2 (c)/n + o 2 (e)/n 



The rationale for this definition is that a converges in probability to ot RA 

under the RA model. This is discussed further below. Since a n . is defined in 

RA 

terms of £^(^3^) and E n j y j(^S c ) whose definitions in turn depend upon the RA 
model, the definition of is tied to the RA model and hence the RA 
subscript. Feldt (1965) has shown that under the additional assumptions of 
independent normal distributions for the {a^, (bjl, { c ij.K and (e^}, 
( 1 * <* DA )/(1 - a) is distributed as F[N-1, (n-1)(N-1)]. Under these 

n A 

assumpt i ons , the conditional variances for both the interactions and errors 
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for a Uerfling, 1933), and equivalents converges in probability ton 
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The random A NOV A (RA) model nas been presented in some detail. It is n:>w 

of interest to compare that model to the factor analysis ' FA) modeL. This 

comparison may be made by examining the conditional -ovaria.nce mai-ix for the 

n sampled items, the conditioning being on the n items selected fr:>m the 

infinite population of items. Let the observed scores on the n items be 

represented by the column vector x . The conditional cova-iance matrix Is 

I = E "(x - 5 (x.))'(x. - E.(x.))] . The diagonal elements or this matrix 
-x|n j u -j j ~j J "~ J 

are Var^x...) = o 2 (b) + o 2 (o.) + o 2 (e.) . Because it is assumed here that 

under the RA model covj ( ^ j ,c . , ) - 0 for any pair of items randomly 

selected from the ^pulation of items, it follows that this co variance will 

be zero for all pairs of items in the randomly selected sample of n items, 

and consequently that the off-diagonal elements of this matrix are 

Gov 'x x ) = o 2 (b) . The rather simple form of this conditional 
y ij' i J 

co variance matrix may be represented as £ x | n = o 7 (b)J - A[o 2 (c.) + o 2 (e.)] 
where J represents a matrix or all ones and A is a diagonal matrix with the 
indicated elements. It follows that the conditional covariance matrix for th* 
true scopes on the n items is 



I . = o 2 (b)J + A(o 2 (c.)) . 
- 1 1 n - i 
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Hocking (1985) presents covariance structures for a wide variety of random and 
mixed ANOVA models. He assumes homogeneity among the error and conditional 
interaction variances. Given his assumptions, his results agree with those 
presented he^e. 

The RA conditional covariance structure is identical to the covariance 
structure of a one common factor FA model with homogeneous factor loadings and 
n specific factors distinct from the errors. This is Spearman ! s (190^) model 
but with the additional restriction that the items all correlate equally with 
the general factor. More specifically, the subject main effect variance in 
the RA model is analogous to the common factor variance in the FA model while 
the conditional interaction variances in the RA model are analogous to 
specific variances in the FA model. Another way to characterize this 
conditional covariance structure is as an essentially tau uivalent model 
(Lord and Novick, 1 968 ) but with the addition of n specific factors with 
possibly heterogeneous variances. 

If the specific factors have homogeneous variances, then the conditional 
covariance structure for the true scores is equivalent to the equicorr el ation 
model (Morrison, 1976). Under the equicorr elation model, the first and 
largest eigenvalue of ? t | n > denoted A x , is equal to no 2 (b) + o 2 (c) . The 
second distinct eigenvalue of ? t | n has multiplicity n-1 and is given 
by o 2 (c) . It is denoted A 2 . 

The simple form of the conditional covariance matrix in the RA model 
results from the uncorrelatedness of the model components. Though this 
covariance structure is a rather restricted special case of the many more 
versatile covariance structures permitted by FA models, the RA model permits 
explicit statistical inference to a population of items. The price for thi3 
gain in "general izability" is the assumption of a simple covariance structure 

17 
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among the items. 

The inferential differences between considering items random and 
considering items fixed may be illustrated by how reliability may be defined 
under these conditions. For subject j, let the item domain true score be 
defined as i, = ^(x^) = v + bj . This implies that the item domain error 
score for subject jise.-x.-x.-a + c + e . Note that for random 

J J J J J 

replications E(e.) - a. + cT , and that for examinees E (c .) = a. . 

J J j j 

Furthermore, considering just a one-item test, Cov.^ yt^ j ✓ ) 85 

a 2 (a) for ail j * j' . These conditions violate the usual assumptions of 

classical test theory (Lord and Novick, 1968, chap. 3), because here the 

errors do not have means of zero an1 the errors are inter-correlated. 

However, Cov.(t.,Gj) - 0 and this crucial result implies that if interest 

focuses on the reliability of a specific test composed of n randomly selected 

items with respect to the item domain true scores, then a useful definition of 

reliability is Rel(x # .,T.) - [Cor (x ,t )] 2 « Reliability so defined 

measures the accuracy with which relationships between observed test scores 

are indicative of relationships between item domain true scores. 

Since Cov . (x . ,t.) - o 2 (b) , 
J • J j 

Var.(x .) - o 2 (b) + (1/n 2 )lV(c.) + (1/n 2 )lV(e ) , 
j * J 11 ii 



and Varj(Tj) = o 2 (b) , it follows that 

o 2 (b) 

Rel (x . ,t .) 



J J a 2 (b) + (1/n 2 )lja 2 (c.) + ( w'n 2 )lV (e. ) (5) 



« Var .(t )/Var (x. ), 
J J J J 
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which is the usual ratio of true score variance to observed score variance. If 

the error variances and the conditional interaction variances are homogeneous 

then a RA = Rel(x^j,Tj) , otherwise a RA is only an approximation to this 

reliability, albeit not a bad one. 

An alternative definition of reliability under the RA model which is more 

appropriate when concern is not with the reliability of a particular randomly 

constructed test but rather with the population of such tests is 

E [ReKx .,t.)] . Here, E n denotes that the expectation io over the 
n • j j n 

population of randomly constructed tests consisting of n items. This 

definition of reliability is appropriate when the same test will be 

administered to every examinee, but concern is with the reliability of any 

randomly constructed test rather than a particular test that is randomly 

selected. The situation in which different examinees take different randomly 

constructed test forms is not often encountered in practice and is not 

addressed in this paper v but see Lord and Novick, 1968, p. 208). If the error 

variances and the conditional interaction variances are homogeneous, then 

E n [Rel(x # j,Tj)] = a RA . This follows since Rel(x # j,Tj) = a RA for each and 

every randomly constructed test consisting of n items. If homogeneity does 

not hold, an exact expression for E (ReKx .,x.)) requires additional model 

n • J J 

specifications which will not be attempted in this paper. However, it may be 

shown by using the delta method of Kendall and Stuart (1 977, Vol. I) that a DA 

RA 

is a first order approximation for E [Rel(x . »x . ) 3 under heterogeneity. 

n * J J 

If the data are accurately described by the RA model, but the usual 
definition of reliability (Lord and Novick, 1 968, chap. 3) is adopted, then 
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ReHx'.j.t.j) - LCor.jU.j.t.j)]' - Va-v a. . War . (7. . ) 



o 2 (b) ♦ (l/n 2 )y. : ?o 2 (c ) 



o 2 (b) ♦ (i/n 2 )T"o 2 (c.) < (i/r. J )'i"o 2 ^ l ) 
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Usually, ReHx. ..t .) > a . However, if the" 5 is no item by examiuoe 

interaction and the error variances are homogeneous th^n ReHx^.t^' = a RA . 

A comparison of (6) to (5) shows that the interaction (specific) 

variances are included in the numerator of ReKx.j.t.j) but excluded from tne 

numerator of ReHx . ,t.) . This difference is due to the difference in 
•J J 

definitions between t. . and x . If the true score is specific to the test, 
i.e., "t j , then the interaction (specific) variances are included in the true 
score variance. When the true score is denned over the population of items, 
i.e., , then the interaction (specific) variances do not contribute to the 
true score variance. 

Two brief observations regarding the RA model are of interest. If no 
interactions are present the RA model may be viewed as a linear analog of the 
one parameter Rasch mod^l (Lord and Novick, 1968, p 402) with explicit item 
and examinee sampling. Second, the symmetry of the RA model allows 
consideration of not only the inter- item cova^iance matrix but also the 
similarly constrained inter-examinee covariance matrix. 

This section of the paper has presented a detailed development of the RA 
model and a brief comparison of the RA model to the FA model. The development 
demonstrates that under the RA model generalization in a statistical manner 
over a population of items requires a simple and specialized covariance 
structure among the items. In the next section, the mixed ANOVA (MA) model is 
considered. 
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The T. t ern 3 by Examinees Mixed ANC K' A M odel 
Hocking ( 1 973) compares three different versions of the two-way mixed 
ANOVA (MA) model that have been presented in the si h*\ s ti cal literature, and 
resolves the differences between thei r> a;.,oclated x; "e^sions for expected 
•no a a squares. This paper adopts the most general on* of these three which is 
du* to Scheffe' (19b9). In the mixed ANOVA model, tnn N examinees are 
randomly sampled from an infinite population of exaxineos, but the n items are 
considered fixed and non- random. Even though the il*ms may be randomly chosen 
from a population of ir,ems, this fact is ignored; tne MA model simply is not 
concerned with statistical inferences to a population o* items. All 
statistical inferences are conditional on the n items selected, since the 
population of items is not defined in the MA model. 
The model may be written as 

x . . - t . + e . . i = 1 , . . . , n j = 1 , . . • , N 

IJ U LJ 

where t . . = u * a . f b . + c . . . The model assumes th-it the error scores have 
ij i J iJ 

zero means for all i and j and this implies that the t°ue and error scores are 

uncorrelated . The non-random parameters p and ot. represent the overall mean 

and the main effect of ir,em i, respectively. The random variable bj 

represents the main effect due to examinee j, while tne random variable c^j 

represents the effect due to the interaction of examinee j with item i. These 

model components are defined as u 3 E . L ( 1 /n) Y^t . .] * E.(t .) , 

J i ij J J 

a. = E.(t..)-u t b.-t.-u, and c. . = t. - t . ■ E .( t . . ) + u • 
i J ij J • J ij ij • J J i J 

The above definitions imply that the model components will satisfy the 

following conditions: S n ^ f = Y n c. . = E.(b.) = E.( n . ) = 0 . 

i i ij J J J *J 

It is aiso implicitly assumed that the items are simliar-ly scored and hence on 
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the same scale. k\ Lowing for heterogeneous error variances yields the 

following: o 2 (e.) = Ej(ejj) and o 2 (e) = 0/n)lV(e.) . 

If the error variances are homogeneous, then oMeJ = o 2 (e) for all i. 

Let tj represent the n dimensional column vector of examinee j r s true 
scores on the n items. The true score covariance matrix is 
I = [o..,) = E.[(t. - E.(t.))'(t. - E.(t.))] . The only restriction placed 
on E is that it be positive semi-definite. The covariance among the items may 
be of a very general form, including any multiple common factor model. This 
is quite different from the RA model where a simple specific conditional 
covariance structure is assumed. Removing the randomness of the items permits 
a much mo^e general covariance structure among the items, but eliminates any 
statistical inferences concerning the population of items. 

From the definitions of the random model components, the variances and 
covariances for these components may be expressed as functions of the 
{a.^} . Scheffe' (1959) shows that 



Var.Cb.) - E.(b 2 ) = o (7) 
J J J J 



Cov.(c ij ,c i , j ) = E.tc.^c.,.) - o H , - o. - o . * a , and 



(8) 



(9) 



Scheffe' (1959) defines the variance components as 



o 2 (b) = Var (b.) and (10) 



o 2 (c) = [l/tn-DDXjvarjtCjj) - C 1 / (n-1 ) ]^( a.. - o # .) . (11) 
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Using these definitions, he shows that MS b and MS C , as previously defined 
under the RA model, have the following expected values under the MA model: 
E,(MSJ = no 2 (b) + a 2 (e) and E M ( MS ) = o 2 (c) + o 2 (e) , where E N denotes the 

N o N C " 

expectation over an infinite number of random samples of N examinees. 

rt is interesting to note that the random components are correlated in 
the MA model and that these correlations are determined by I , In the RA 
model the random components are uncorrelated, but the covariances among the 
items are required to be homogeneous. What happens to the component 
correlations in the MA model when the inter-item covariances are assumed to be 
homogeneous will be investigated shortly. 

First, however, reliability and its relationship to coefficient alpha 
wUl be discussed. The sample alpha coefficient under the MA model is 
identical to the sample alpha for the RA model, and is given 

as a =-- (MS - MS VMS,. Its parametric counterpart under the MA model will be 
b c b 

denoted by ol 4A and is defined as 
MA 

£V M V - E N (MS c )] oMb) - oilcVn (12) 

a MA - E N (MS b T - o2(b) + o2(e)/n ' 



The rationale for this definition is that a converges in probability 

to a... under the MA model. This is further discussed below. If (1) the 
MA 

random model components including the errors are normally distributed, (2) the 
error variances are homogeneous (though mild heterogeneity should be 
acceptable), and (3) o 2 (c) = 0, then using results given by Scheffe' (1959) it 

may be shown that [(1 - a MA ^ 1 " a ^ is distributecl 33 
F[N-1, (n-1)(N-1)] f which is the same distribution as under the RA model. 
Similarly, this F distribution implies that E^(a) = C ( N-1 )/ (N~3) ]a MA - 
[2/(N-3)] , and hence that a is an asymptotically unbiased and consistent 
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estimate of a . Kristof (1 963) has previously derived these results. If 
a 2 (c) * 0, then the F distribution still holds if Z has the highly symmetric 
structure discussed by Scheffe' (1959, p 26H) or if Z^ has the type H form 
described by Huynh and Feldt (1970); but as will be seen later a WA is then a 

MA 

strict lower bound to reliability. However, even if the foregoing assumptions 

are not fulfilled, a is still a consistent estimator of a kJ . since it is a 

MA 

method of moments estimate for cc^ (Serfling, 1 983), and equivalentiy 
converges in probability to . Finally, it should be noted that 
if o 2 (c) = 0, then all the * 0 and the MA model is identical to the 
essentially tau equivalent model discussed by Lord and Novick (1958). 

Under the MA model, the mean true score of examinee j is 
t = ( 1 /n)) . E(x . . ) where, as discussed under the RA model, E denotes 

• J 1 J- J K 

expectation over the errors associated with random replications. 
Let Xj denote the n dimensional column vector of the j-th examinee's observed 
scores on the n items. Let denote the covariance matrix for the observed 
scores. It follows that Z « Z + A(o 2 (e.)) where A(o 2 (e.)) is a diagonal 

X 1 1 

matrix with the error variances as its elements. Following Lord and Novick 
(1968, chap. 3), reliability under the MA model is defined as 

/ 

ReKx.j.t ) = [Cor j (x .j' t .j ) J 2 = o 2 (b)/[o 2 (b) + o 2 (e)/n] (13) 
= Var,(t . )/Var . (x . ) . 

The above follows from the expressions for the variance components given in 
(10) and (11). Comparison of the last expression in the first line of (13) 
with the expression for <* MA given in (12) demonstrates 
that a M . = Rel (x . ,t ) if and only if o 2 (c) = 0 , i.e., the items are 
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^.).-Hially -qiivilent . Otherwise, a Mft < Rel ( x , . ,1 , . > . This agrees with 
tri- r^.vil ts of Guttman '1^5), Novick and Lewi 3 ('9- ' Rentier (1972), and 
Jackson and Agunwamba (1977^. 

Und^r * v !>' assumption of equivalent covarianee -v -j':t jrea for the RA and 
XA -nonels, comparisons between the two models rega-i.-.; v^iar.":3 components, 
rel iabi ) ity, and coefficient alpha will now be und»— .a'-^-i. The RA true score 
;c;ri 11 tiornl cova-ianee structure given in CO may b* ■•-expressed as 



I . * qJ - A(u 2 ) 
-t n - i 



«rhere o a (b; -- q and o 2 (c.) - u 2 . The following t^ue score covariance 
struct ire will bn assumed for the MA model: 

E - qJ ♦ A(u 2 ) . < 15) 
For' the above oo variance structure, Table 1 displays the variance 



Insert Table 1 about here 



components for the RA .and MA models. This paper has "ol Lowed the convention 
of labeling the variance components the same in both models, but Table 1 shows 
that the variance components have different meanings unde- the two models. 
While o 2 (c) depends only on the specific variances, thougn in different ways 
in the two models, o 2 (b) includes common and specif variances under the MA 
model hut only common variance unde rl the RA model. For more complicated 
covariance structures than under the MA model, r.ueh simple relationships 

between the variance components and the covariance nr. it *m x are not apparent. 

The differences in variance components between > two models have 
ramifications for reliability and coefficient alpha ind<" the two models. 
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Table 2 displays alpha and reliabilities for the two models under the 



Insert Table 2 about here 



indicated covariance structure. Coefficient alpha differs statistically under 
the two models in that expectations are used in the denominator of a RA while 
summations are used in the denominator of Nonetheless, coefficient alpha 

has a similar psychometric meaning under the two models 3ince under both 
models the numerator and denominator depend, with slight variations, on the 
same elements of the covariance matrix. Rel(x^,t^) is identical under the 
two models, but differs from Rel(x^,T ) under the RA model as has already 
been noted. 

Under the RA model, the random model components are uncorrected as was 
previously discussed. For the MA model under the covariance structure in 
(15), 

Covjtbj.cjj) - [q + (u[/n)] - [q + (1/n 2 )^u;3 

= [uj - <1/n)Xju*]/n and 

Cov^c.^Ow j) - q - (q + u 2 /n) - (q + u 2 ,/n) + [q + (1/n 2 )XjUj] 

« [(1/n)I n u 2 - u 2 - u 2 ,]/n . 
L i i i i 

I f all the u 7 : are equal , then Cov, (bi,CiJ « 0 and Gov . (c. . ,c . , . ) ~u 2 /n 
i J J iJ J ij i J 

where u 2 is the common value for all the u 2 . The covariance -uVn is due to 
the fact that under the MA model £ n c. . = 0 for all j. As was noted 



26 



Linear Models 
2H 



previously for the RA model, the uncorrelatedness of the random model 
components results in the simple covariance structure given in (H) and 0*0. 
What has just been shown is that when a slightly simpler covariance structure 
is assumed for the MA model, the random model components essentially become 
uncorrelated. Hence, the correlations among the random components and the 
inter-item covariances are related in a similar fashion under both models. To 
obtain psychometric inference under a more complicated inter-item covariance 
structure than (14) requires an RA type model which permits the model 
components to be correlated. Such correlations would make expressions for the 
mean squares much more difficult to obtain. 

Finally, when the u 2 are homogeneous and hence the equicorrelation 
covariance structure presented by Morrison (1976) (that is equivalent to 
Scheffe' f s (1959) highly symmetric covariance structure) holds, then 
na 2 (b) = \ x where \ x is the first and largest eigenvalue of I in the MA 
model. The one remaining distinct eigenvalue of I, X 2 , has multiplicity n-1 
and i s equal to o 2 ( c ) . 

Summary and Discussion of Implications for Practice 
It has been shown that coefficient alpha is approximately equal to but 
not necessarily a lower bound to reliability under the RA model, and that it 
is a lower bound to reliability under the MA and FA models (the result for the 
FA model having been shown previously by others). These conclusions concern 
the parameter values for these quantities and not necessarily their sample 
estimates. Under the RA model where statistical inference to a population of 
items from a sample of items is permitted, it was found that the inter-item 
covariances must be homogeneous, and that this homogeneity is due to the model 
components being uncorrelated. This restriction is not required under the MA 
model, but it does not permit psychometric inference. These conclusions are, 
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of course, specific to the models under consideration, and other models may 
yield different results. 

It is usually the case in education and psychology that inference from a 
sample of items to a population of items is a desired goal in the analysis of 
test data. However, this may not always be true. A situation in educational 
measurement where psychometric inference may not be required is when a test is 
divisible into well defined content heterogeneous subtests, and the subtest 
scores are the measurements being analyzed. In this situation, an appropriate 
model for the data could be a subtest by examinee two-way MA model. In 
psychology, if an affective scale such as a personality inventory consists of 
well defined psychologically distinct subscales, then a subscales by subjects 
two-way MA model could also be an appropriate model for the data. 

If psychometric inference is desired and if ",he RA model presented within 
is going to be used to analyze the data, then it is appropriate to investigate 
whether or not the data satisfy the covariance structure assumed under the RA 
model. This covariance structure is a linear covariance structure, and Browne 
(1972) has derived a procedure based on the principle of generalized least 
squares (GLS) estimation that may be used to statistically test the fit of the 
data to the RA model covariance structure. Browne's (1972) method is non- 
iterative and hence relatively simple computationally. Joreskog (1978) 
discusses statistical tests for covariance structures based on GLS and maximum 
likelihood (ML) estimation methods. The computer program LISREL VI 
(Joreskog and Sorbom, 1986) implements those methods as well as others, and 
is accessible through the SPSS X (SPSS X Inc, 1986) computer program. Bentler 
(1 983) and Browne (1981) have developed GLS test procedures with weaker 
distributional assumptions but more computational complexity. Bentler (1985) 
has also written a computer program, EQS, which implements his procedure and 
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is available as part of the BMDP Statistical Software computer package. It is 
designed for easy use. If the RA model fits the data, then a is an 
appropriate estimator for trie reliability index, ReHx^,-^), which assesses 
how well relationships between observed scores represent relationships between 
item domain true scores. 

If the items are dichotomously scored, then difficulties may arise in 
applying the above procedures to the usual sample covariance matrix or the 
sample matrix of phi coefficients. Mislevy (1986) discusses these problems 
and reviews alternative methods for testing covariance structures designed to 
deal with dichotomously scored items. However, the results of Collins et al . 
(1986) suggests that it may be appropriate to first analyze the usual matrix 
of sample moment covariances or correlations. If difficulties arise, then 
recourse may be had from the more theoretically and computationally complex 
methods discussed by Mislevy (1986). 

If the RA model cannot be applied because the data substantially violate 
the requirement of homogeneous inter-item covariances, or inference to a 
population of items is not desired, then the MA model may be used. As was 
shown, a is a lower bound to reliability under tho MA model and consequently 
under any FA model (the latter having been shown previously by many others). 
However, under the MA model, better lower bounds than a exist. The best is 
the greatest lower bound to reliability, derived independently by Jackson and 
Agunwamba (1 977) and Bentler and Woodward (1 980). Bentler and Woodward (1 983) 
present the most efficient numerical algorithm for computing a sample estimate 
of the greatest lower bound to reliability. In general terms, the computation 
requires the solution of a nonlinear optimization problem with inequality 
constraints and is rather complex. For the investigator who desires a simpler 
estimate, even if it is less optimal, Jackson and Agunwamba (1977) suggest 
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that Guttman's coefficient may be advantageous " in the typical situation 
where the inter-item correlations are positive, modest in size, and rather 
similar." The computer package SPSS X (SPSS X Inc., 1936) has a reliability 
component which computes a sample estimate for \^ as well as several other 
reliability estimates. 

If the test has many items, then some investigators may find it difficult 
or expensive to compute sample estimates for A 6 or the greatest lower bound. 
These investigators may view coefficient a as an appealing reliability index 
for long tests because of its computational simplicity. Such investigators 
may find solace in the results of Green, Lissitz, and Mulaik (1977) which 
suggest that a increases as the number of items increases even when the test 
has multiple common factors and a is only a strict lower bound to the 
parameter value of reliability. Green et al . (1977) argue that this result 
makes a a poor index of test unidimensionality . Fortunately, those qualities 
which make a a poor index for unidimensionality Increase its worth as a 
reliability index, and this is especially true for long tests. Nonetheless, 
the greatest lower bound to reliability has optimal properties which indicate 
that it is worth computing whenever feasible. 

Finally, because coefficient alpha may be a useful estimate of 
reliability under both the RA and MA models, it is worthwhile to review the F 
distribution theory for a under both models. In addition to the appropriate 
normality assumptions for each model, the F distribution theory requires 
homogeneity of error variances under both ANOVA models and homogeneity of 
interaction conditional variances under the RA model, but mild heterogeneity 
of these variances should not greatly affect the distribution theory. Under 
the RA model, a may equal or approximately equal rel iabil tty r-Wfien the F 
distribution for a holds, but a is not a low^r bound for reliability. Under 
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the MA model, the F distribution theory for a holds and a equals reliability 
when there are no interactions. If interactions are present, then the F 
distribution theory for a requires the special covariance structures of 
Scheffe' (1959, p 264) or Huynh and Feldt (1970) and a is then a strict lower 
bound to reliability. If a conservative estimate of u or the parameter value 
of reliability under either model is desired, then Woodward and Bentler (1978) 
show how the F distribution theory for a may be used to obtain a probabilistic 
lower bound to a. 
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Table 1 

A "ompa^ison BeWoen Variance Components f . '.he RA and MA 
Models Under the Indicated Cova^iance Struvt for Both Models 



RA Mod<K MA M odel 

E. , - q.J * A(uJ) E - qJ + Mup 

0 ! (b) - q o 2 (b) = q ♦ (1/n 2 )[';.j] 

Jar. -.2 .) - o 2 (e.) = a 2 Var.(c..) = [(n-2:u 2 ♦ :i/n)£V]/n 

J . ] L l J 1J i 11 

o 2 (c. = E.;o J (j.'j - S.(u?) a 2 (c) = [1/(n-1>]Z^a.",'.c M ) = [l/n]£"u 2 
i.i Li * j i J li 
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